perf: zero copy validity export to duckdb#7371
Conversation
Polar Signals Profiling ResultsLatest Run
Previous Runs (6)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.966x ➖ datafusion / vortex-file-compressed (0.966x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.916x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.922x ➖, 2↑ 0↓)
datafusion / parquet (0.890x ✅, 7↑ 0↓)
duckdb / vortex-file-compressed (0.938x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.953x ➖, 1↑ 1↓)
duckdb / parquet (0.914x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.855x ✅, 19↑ 0↓)
datafusion / vortex-compact (0.912x ➖, 11↑ 0↓)
datafusion / parquet (0.839x ✅, 20↑ 1↓)
datafusion / arrow (0.838x ✅, 13↑ 3↓)
duckdb / vortex-file-compressed (0.919x ➖, 3↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (0.937x ➖, 3↑ 0↓)
duckdb / duckdb (0.866x ✅, 15↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (195 files changed, -98.4% overall, 0↑ 195↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.011x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.004x ➖, 1↑ 0↓)
datafusion / parquet (1.008x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.019x ➖, 1↑ 6↓)
duckdb / vortex-compact (0.995x ➖, 3↑ 2↓)
duckdb / parquet (0.993x ➖, 2↑ 0↓)
duckdb / duckdb (1.023x ➖, 1↑ 2↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.056x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.016x ➖, 0↑ 0↓)
datafusion / parquet (1.092x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.973x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.973x ➖, 0↑ 0↓)
duckdb / parquet (1.020x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.055x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.965x ➖, 2↑ 0↓)
datafusion / parquet (0.985x ➖, 0↑ 0↓)
datafusion / arrow (0.965x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.984x ➖, 0↑ 0↓)
duckdb / parquet (1.010x ➖, 0↑ 1↓)
duckdb / duckdb (0.979x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.138x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.020x ➖, 0↑ 0↓)
datafusion / parquet (1.088x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.040x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.079x ➖, 0↑ 0↓)
duckdb / parquet (1.041x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.971x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.974x ➖, 0↑ 0↓)
duckdb / parquet (0.989x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.017x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.028x ➖, 0↑ 2↓)
datafusion / parquet (1.054x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.927x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.963x ➖, 0↑ 0↓)
duckdb / parquet (0.949x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.942x ➖, 5↑ 0↓)
datafusion / parquet (0.969x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.958x ➖, 4↑ 1↓)
duckdb / parquet (0.979x ➖, 0↑ 0↓)
duckdb / duckdb (0.965x ➖, 3↑ 3↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 0.991x ➖ unknown / unknown (1.011x ➖, 4↑ 13↓)
|
Benchmarks: Random AccessVortex (geomean): 1.163x ❌ unknown / unknown (1.136x ❌, 0↑ 26↓)
|
|
|
||
| // Set the validity pointer for the vector to external data, and store the buffer in auxiliary | ||
| // to keep it alive. This enables zero-copy export of validity masks. | ||
| void duckdb_vx_vector_set_validity_data(duckdb_vector ffi_vector, void *validity_ptr, idx_t capacity, |
There was a problem hiding this comment.
If validity_ptr points to buffer, just pass the buffer
There was a problem hiding this comment.
validity_ptr is not the buffer is something a few levels of ptr deep. We could fix, but would also want do change Primitive/Decimal Export at once
| // Same hack for ValidityMask: access protected fields via inheritance. | ||
| class ExternalValidityMask : public ValidityMask { | ||
| public: | ||
| inline void SetExternal(validity_t *ptr, idx_t cap, |
There was a problem hiding this comment.
Same here, pass just the buffer and derive ptr from it
| *ext_buf, reinterpret_cast<TemplatedValidityData<validity_t> *>(ext_buf->get())); | ||
|
|
||
| // Set validity_mask, capacity, and validity_data (which keeps the buffer alive). | ||
| ext_validity->SetExternal(reinterpret_cast<validity_t *>(validity_ptr), capacity, |
There was a problem hiding this comment.
Technically this will slice the class to base's validity, but as derived class doesn't have any members, it's fine. Worth adding a comment
…ty-export Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Zero copy export validity similarly to how we export data for Primitive or Decimal.